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Abstract:  We  describe  the  participation  of  the 
University  of  Amsterdam’s  ILPS  group  in  the  blog 
track  at  TREC  2008.  We  mainly  explored  different 
ways  of  using  external  corpora  to  expand  the  orig¬ 
inal  query.  In  the  blog  post  retrieval  task  we  did 
not  succeed  in  improving  over  a  simple  baseline 
(equal  weights  for  both  the  expanded  and  original 
query).  Obtaining  optimal  weights  for  the  origi¬ 
nal  and  the  expanded  query  remains  a  subject  of 
investigation.  In  the  blog  distillation  task  we  tried 
to  improve  over  our  (strong)  baseline  using  exter¬ 
nal  expansion,  but  due  to  differences  in  the  run 
setup,  comparing  these  runs  is  hard.  Compared  to 
a  simpler  baseline,  we  see  an  improvement  for  the 
run  using  external  expansion  on  the  combination 
of  news,  Wikipedia  and  blog  posts. 


1  Introduction 


We  describe  our  participation  in  this  year’s  TREC  Blog 
track.  Like  last  year,  the  blog  track  consists  of  two  separate 
tasks:  blog  post  retrieval  and  blog  distillation.  Besides  the 
task  of  finding  topically  relevant  blog  posts,  the  blog  post 
retrieval  task  has  two  further  tasks:  finding  blog  posts  that 
contain  an  opinion  on  the  given  topic  and  determining  the 
polarity  of  the  opinion.  To  test  the  opinion-ranking  capa¬ 
bilities  of  participants’  systems,  participants  were  asked  to 
rerank  five  baseline  runs  based  on  opinionatedness,  besides 
submitting  four  full  opinion  retrieval  runs.  Our  main  interest 
this  year  lies  with  the  topical  retrieval  of  both  blog  posts  and 
blogs.  We  did  not  participate  in  the  polarity  determination 
and  only  submitted  very  basic  opinion  finding  runs. 

The  remainder  of  this  paper  introduces  our  retrieval  ap¬ 
proaches  for  both  tasks  in  Section  2,  and  explains  the  way 
we  incorporated  external  sources  in  query  modeling  in  Sec¬ 
tion  3.  We  then  zoom  in  on  the  runs  for  both  tasks  and  their 
results:  post  retrieval  in  Section  5  and  blog  distillation  in 
Section  6.  Einally,  we  conclude  in  Section  7. 


2  Retrieval  Approaches 

In  the  blog  post  retrieval  task  we  use  an  out-of-the-box  im¬ 
plementation  of  Indri.^  Results  of  previous  years  showed 
good  overall  performance  of  Indri  compared  to  other  sys¬ 
tems  and  besides,  it  allows  for  easy  use  of  query  models 
(queries  consisting  of  weighted  terms). 

In  the  blog  distillation  task  we  use  our  in-house  expert  re¬ 
trieval  model  (Balog  et  al.,  2006),  which  we  translated  to  fit 
the  task  of  blogger  retrieval  (Balog  et  al.,  2008;  Weerkamp 
et  al.,  2008).  The  main  reason  for  using  this  model  is  that  we 
believe  blog  distillation  should  be  solved  using  a  post  index 
(as  opposed  to  a  full  blog  index).  Although  last  year’s  blog 
track  showed  good  performance  of  blog  indexes,  we  stick  to 
a  post  index  for  three  reasons:  (i)  a  post  index  allows  for  easy 
incremental  updating,  (ii)  posts  are  a  natural  unit  for  result 
presentation  to  the  user,  and  most  importantly,  (iii)  only  one 
index  is  needed  for  both  post  retrieval  and  blog  distillation. 

We  estimate  the  probability  of  a  blog  blog  generating 
query  Q  as  follows: 

P{Q\%iog)  =  Y{P{t\QuogT‘'''Q\  (1) 

tea 

Next,  we  smooth  the  probability  of  a  term  given  a  blog  with 
the  background  probabilities: 

P{t\QMog)  =  (1  —  '^blog)  ■  P{t\blog)  +  'kblog  -Pit)-  (2) 

Einally,  we  estimate  P{t\blog)  as  follows: 

P{t\blog)  =  ^  P{t\post,blog)  ■  P{post\blog).  (3) 

post€blog 

We  assume  that  the  post  and  the  blog  are  conditionally  in¬ 
dependent,  thus  P{t\post,blog)  —  P{t\post),  and  approxi¬ 
mate  P{t\post)  with  the  standard  maximum  likelihood  es¬ 
timate.  In  Section  6  we  detail  our  choices  for  estimating 
p{post\blog). 

3  Query  Modeling 

Eor  both  tasks  we  experimented  with  query  models  using 
external  corpora.  In  short,  we  assume  that  documents  in  the 
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target  collection  (the  blog  collection)  are  too  noisy  to  gen¬ 
erate  good  query  models  based  on  blind  relevance  feedback. 
Instead,  we  use  different,  less  noisy  external  corpora  for  ex¬ 
panding  our  original  query.  As  much  of  what  goes  on  in  the 
blogosphere  is  determined  by  news  events,  we  use  a  con¬ 
temporary  news  corpus  AQUAINT-2^  as  our  external  cor¬ 
pus.  Besides  this,  many  queries  directed  towards  blogs  and 
blog  posts  contain  named  entities  (persons,  locations,  orga¬ 
nizations,  products)  or  general  concepts  (especially  in  blog 
distillation).  For  this  we  also  look  at  Wikipedia  as  an  exter¬ 
nal  corpus,  since  this  source  contains  focused  information 
on  many  general  concepts  and  named  entities. 

For  two  post  retrieval  runs  we  use  Lavrenko’s  relevance 
model  2  (Lavrenko  and  Croft,  2001)  to  select  the  top  10 
terms  from  the  top  10  external  documents.  After  select¬ 
ing  weighted  new  terms,  we  construct  the  final  query  model 
P(f|0Q)  by  combining  this  new  query  P(f  |0q)  with  the  orig¬ 
inal  query  P{t\Q)  using: 

p(f  |0e)  =  XqP^AQq)  +  (1  -Xe)p(f  le).  (4) 


In  two  opinion  retrieval  runs  and  two  blog  distillation  runs 
we  use  an  experimental  approach  to  query  expansion.  We 
estimate  the  probability  of  a  expansion  term  t  given  the  query 
Q  and  set  of  external  corpora  C: 


pm,c) 


y  P{t\c,Q)-P{c\Q) 
kc  l^c'ecP{c'\Q) 


(5) 


We  estimate  p{t\c,Q)  based  on  the  probability  of  document 
d  given  the  query  and  corpus,  and  the  probability  of  term  t 
given  the  document: 

P{t\c,Q)=  Y.  P{t\D)P{D\Q,c).  (6) 

DGe;/>(D|g,e)>0 


Next,  we  estimate  P{D\Q^c),  the  probability  of  document  D 
given  corpus  c  and  query  Q: 


P{D\Q,c)=Y\P{q\D) 

q(^Q 


n{Q,D)-\Q\-^ 

1^1 


(7) 


where  n{Q,D)  is  the  count  of  phrase  Q  in  document  D  and 
P{q\d)  =  n(<7,Z))  •  |D|^*.  Finally,  we  estimate  the  probability 
of  corpus  c  given  query  Q: 


P{c\Q) 


L 

Dec;P{D\Q,c)>0 


P{D\Q,c) 

|DGc;p(D|e,c)>0|' 


(8) 


4  Metrics  and  Significance 


5  Blog  Post  Retrieval 

As  explained  in  the  introduction  to  this  section,  we  use  an 
out-of-the-box  implementation  of  Indri  as  our  retrieval  sys¬ 
tem.  Runs  are  evaluated  on  two  topic  sets:  the  new  2008 
topics  alone  and  the  full  set  of  150  topics  (2006-2008). 

We  submitted  6  runs: 

(A)  uamsOSnlol  the  baseline  run  uses  a  news  corpus  for 

query  expansion  with  Xq  =  0.5  (i.e.  equal  weights 
to  expanded  and  original  query)  and  assigns  priors  to 
posts  based  on  credibility. 

(B)  uamsOSnlolsp  identical  to  previous  run,  but  with 

“opinionatedness  prior”. 

(C)  uamsOSclass  query  expansion  using  both  a  news 

corpus  and  Wikipedia;  Xq  trained  on  2006  and  2007 
topics,  and  priors  based  on  credibility  indicators. 

(D)  uamsOSclspr  identical  to  the  previous  run,  but  with 

“opinionatedness  prior”. 

(E)  uams08qm4itl  query  expansion  following  Section  3 

on  a  news  corpus  and  Wikipedia. 

(F)  uams08qm4it2  identical  to  previous  run,  but  with  the 

blog  post  corpus  as  additional  source. 

We  experiment  with  estimating  Xq  based  on  old  topics:  for 
each  of  the  old  (2006/2007)  topics  we  know  the  performance 
of  various  parameter  settings  (weights  of  different  corpora) 
in  terms  of  MAR  We  use  this  information  in  the  following 
way:  for  each  unseen  topic  f'  we  assing  a  similarity  score 
to  seen  topics  (t)  based  on  overlapping  documents  in  the  re¬ 
sult  lists.  Next,  we  multiply  this  overlap  score  by  the  MAP 
performance  of  each  mixture  setting  and  determine  the  “op¬ 
timal”  mixture  weights  this  way.  This  method  is  used  in  runs 
uamsOSclass  and  uamsOSclspr. 

Four  of  our  runs  (A-D)  also  use  credibility  priors:  based 
on  a  combination  of  6  credibility  indicators  (Weerkamp  and 
de  Rijke,  2008),  we  estimate  the  prior  probability  of  the  blog 
post  being  relevant.  Since  all  runs  use  the  same  priors,  we 
cannot  determine  its  effectiveness  here,  but  it  has  proven 
successful  before  Weerkamp  and  de  Rijke  (2008). 

Looking  at  opinion  retrieval,  we  explore  the  use  of  an 
“opinionatedness  prior.”  To  construct  this  prior  we  use 
strongly  opinionated  terms  from  the  OpinionFinder  system^ 
and  calculate  for  each  post  the  ratio  of  opinionated  terms  to 
the  total  number  of  terms.  We  use  this  prior  on  top  of  our 
two  baseline  runs  uamsOSnlol  and  uamsOSclass,  to  come  to 
runs  uamsOSnlolsp  and  uamsOSclspr. 


In  this  paper  we  report  on  mean  average  precision  (MAP), 
precision  at  5  and  10  documents  (P5,  PIO),  and  mean  recip¬ 
rocal  rank  (MRR).  We  use  the  Wilcoxon  signed-rank  test  to 
test  for  significant  differences  between  runs.  We  report  on 
significant  increases  (or  drops)  for  p  <  .01  using  ^(and  ^) 
and  for  p  <  .05  using  ^(and  ^). 

^http; //trec.nist .gov/data/qa/2007_qadata/qa. 07 . 
guidelines .html# documents 


5.1  Results  and  Discussion 

From  the  results  in  Tables  1  and  2  we  have  three  initial  obser¬ 
vations:  (i)  The  runs  using  the  method  for  combining  exter¬ 
nal  corpora  introduced  in  Section  3  (i.e.,  uamsOSqmditl  and 
uamsOSqm4it2)  perform  significantly  worse  than  runs  using 

^http ; / /www. cs .pitt . edu/mpqa/ 


Run 

MAP 

P5 

PIO 

MRR 

All  topics 

uamsOSnlol 

0.3329 

0.5987 

0.5693 

0.7309 

uamsOSnlolsp 

O 

► 

0.6040 

0.5687 

0.7275 

uamsOSclass 

0.3297 

0.5840 

0.5660 

0.7377 

uamsOSclspr 

0.3323^ 

0.5853 

0.5647 

0.7349 

uams08qm4itl 

0.2633^ 

0.4747^ 

0.4620^ 

0.6007^ 

uams08qm4it2 

0.1969^ 

0.3480^ 

0.3587^ 

0.4539^ 

2008  topics 

uamsOSnlol 

0.3797 

0.7080 

0.6620 

0.8052 

uamsOSnlolsp 

0.3823^ 

0.7120 

0.6580 

0.8052 

uamsOSclass 

0.3685 

0.6680 

0.6420 

0.7852 

uamsOSclspr 

0.3715^ 

0.6640 

0.6400 

0.7852 

uams08qm4itl 

0.2927^ 

0.5360^ 

0.5300^ 

0.6567^ 

uams08qm4it2 

0.2122^ 

0.4120^ 

0.4120^ 

0.5431^ 

Table  1 :  Opinion  results  on  the  blog  post  retrieval  task.  Sig¬ 
nificance  of  uamsOSclspr  and  uamsOSnlolsp  tested  against 
their  baselines,  other  runs  tested  against  the  first  run, 
uamsOSnlol . 

relevance  models  and  a  linear  combination  of  the  expanded 
query  and  original  query  {uamsOSnlol  and  uamsOSclass). 
(ii)  Looking  at  the  runs  using  relevance  models  to  construct 
query  models  (uamsOSnlol  and  uamsOSclass),  we  see  that 
estimating  the  relative  importance  of  the  original  query  is  not 
easy:  the  simple  baseline  approach  (A-  =  0.5)  outperforms 
the  slightly  more  advanced  per-topic  estimation,  (iii)  The 
runs  using  opinion  priors  (uamsOSnlolsp  and  uamsOSclspr) 
significantly  outperform  their  baseline  counterparts  in  terms 
of  MAP,  not  only  on  opinion  retrieval,  but  also  on  topical 
retrieval. 


6  Blog  Distillation 

Our  blog  distillation  model  allows  for  the  estimation  of 
the  importance  of  individual  posts  to  a  blog,  i.e.,  esti¬ 
mating  association  strengths  between  posts  and  their  blog 
(P{post\blog)  in  Eq.  3).  Based  on  previous  experi¬ 
ments  (Weerkamp  et  ak,  2008)  and  additional  tests  on  the 
2007  topics  we  use  a  combination  of  blog  features  to  es¬ 
timate  this  association  strength:  post  length,  recency,  and 
number  of  comments.  On  top  of  this,  we  noticed  that  using 
information  from  the  post  title  is  an  important  indicator  of 
relevance  in  the  blog  distillation  task.  To  be  able  to  use  this 
information,  we  perform  a  linear  combination  between  runs 
on  the  full  post  index  and  runs  on  a  title-only  index.  This  run 
is  our  baseline  run,  uamsOSbl. 

We  again  experiment  with  expansion  on  external  corpora 
using  the  novel  method  introduced  in  Section  3.  In  run 
uamsOSnw  we  use  the  news  corpus  and  Wikipedia,  in  run 
uamsOSpnw  we  also  use  the  post  index  as  external  corpus. 
The  difference  with  the  baseline  is  that  we  do  not  use  the 
combination  with  the  title-only  index:  for  this  submission 


Run 

MAP 

P5 

PIO 

MRR 

All  topics 

uamsOSnlol 

0.4350 

0.7680 

0.7480 

0.8464 

uamsOSnlolsp 

0.4366^ 

0.7667 

0.7473 

0.8419 

uamsOSclass 

0.4313 

0.7507 

0.7493 

0.8439 

uamsOSclspr 

0.4332^ 

0.7520 

0.7473 

0.8441 

uams08qm4itl 

0.3627^ 

0.6800^ 

0.6713^ 

0.7780^ 

uams08qm4it2 

0.2745^ 

0.5760^ 

0.5740^ 

0.6869^ 

2008  topics 

uamsOSnlol 

0.4644 

0.8040 

0.7620 

0.8892 

uamsOSnlolsp 

0.4661^ 

0.8000 

0.7620 

0.8892 

uamsOSclass 

0.4494 

0.7680 

0.7480 

0.8358 

uamsOSclspr 

0.4513^ 

0.7720 

0.7500 

0.8408 

uams08qm4itl 

0.3734^ 

0.6720^ 

0.6600^ 

0.8052^ 

uams08qm4it2 

0.2606^ 

0.5480^ 

0.5380^ 

0.6981^ 

Table  2:  Topical  results  on  the  blog  post  retrieval  task.  Sig¬ 
nificance  of  uamsOSclspr  and  uamsOSnlolsp  tested  against 
their  baselines,  other  runs  tested  against  the  first  run, 
uamsOSnlol . 

we  would  like  to  look  at  the  influence  of  the  query  expan¬ 
sion  and  scores  of  the  two  runs  (using  query  expansion  and 
the  title-only  run)  are  in  a  very  different  range,  calling  for 
other,  more  suitable  ways  of  combining  these  scores. 

The  final  run  we  submitted,  uamsOSnonr  is  a  highly  ex¬ 
perimental  run:  an  important  aspect  of  the  blog  distillation 
task  is  to  return  not  just  blogs  that  mention  this  topic,  but 
mention  it  quite  often.  In  that  sense,  we  do  not  only  want 
to  determine  the  relevance  of  the  blog  for  a  given  topic,  but 
also  the  non-relevance  for  that  topic  (i.e.  relevant  regard¬ 
ing  different  topics).  We  tried  to  estimate  this  by  looking 
at  the  performance  of  blogs  on  the  2007  topics  and  use  this 
as  indicator  of  non-relevance  (assuming  the  2008  topics  are 
different  from  the  2007  topics);  the  relevance  score  of  a  blog 
(Eq.  1)  is  divided  by  the  average  relevance  score  of  that  blog 
on  all  2007  topics.  A  blog  with  a  high  relevance  score  and 
low  relevance  scores  on  other  topics  will  get  a  score  (and 
rank)  boost. 

Summarizing,  we  submitted  the  following  runs,  and  added 
one  extra  baseline  run  to  our  results  table:  We  submitted  6 
runs: 

(A)  uamsOSbl  P{post\blog)  based  on  number  of  com¬ 

ments,  post  length,  and  recency;  combination  of  ti- 
tle-tbody  run  and  title-only  run. 

(B)  baseline  identical  to  previous  run,  but  without  the 

combination  with  a  title-only  run. 

(C)  uamsOSnw  identical  to  previous  run,  but  with  query 

expansion  following  Section  3  on  a  news  corpus  and 
Wikipedia. 

(D)  uamsOSpnw  identical  to  previous  run,  but  with  the 

blog  post  corpus  as  additional  external  corpus. 

(E)  uamsOSnonr  ratio  of  relevance  to  non-relevance  of  a 

blog. 


Run 

MAP 

P5 

PIO 

MRR 

baseline 

0.2567 

0.4480 

0.4180 

0.7298 

uams08bl 

0.2638^ 

0.4600 

0.4200 

0.7294 

uams08nonr 

0.0257^ 

0.1000^ 

0.0900^ 

0.2393^ 

uams08nw 

0.2489 

0.4080 

0.3660 

0.6515 

uams08pnw 

0.2620 

0.4080 

0.3900 

0.6303^ 

Table  3:  Results  on  the  blog  distillation  task.  Significance 
tested  against  baseline. 


6.1  Results  and  Discussion 

The  results  of  our  submitted  runs,  plus  the  evaluation  of 
one  additional  run  are  presented  in  Table  3.  The  results 
show  some  interesting  things:  (i)  The  experimental  run  us¬ 
ing  “non-relevance”  fails  completely,  indicating  we  need  dif¬ 
ferent  ways  of  incorporating  this  notion  of  non-relevance, 
(ii)  Our  baseline  (uamsOSbl)  is  a  pretty  strong  baseline  and 
cannot  be  beaten  by  the  other  runs  (except  on  MRR  by  base¬ 
line).  (iii)  Query  expansion  can  improve  over  the  absolute 
baseline  in  terms  of  MAP,  but  still  performs  less  than  the 
combination  with  the  titles. 


7  Conclusions 

In  this  year’s  participation  in  the  blog  track  we  mainly  ex¬ 
plored  different  ways  of  using  external  corpora  to  expand 
the  original  query.  In  the  blog  post  retrieval  task  we  did  not 
succeed  in  improving  over  a  simple  baseline  (equal  weights 
for  both  the  expanded  and  original  query)  and  we  need  a 
thorough  analysis  to  find  out  why  this  did  not  work.  For  the 
same  task,  further  investigation  is  needed  to  determine  the 
effectiveness  of  the  credibility  priors  and  to  see  what  hap¬ 
pens  when  the  opinion  prior  is  applied. 

In  the  blog  distillation  task  we  tried  to  improve  over  our 
(strong)  baseline  using  external  expansion.  Since  this  base¬ 
line  also  uses  information  from  the  title  explicitly,  it  is  hard 
to  determine  why  the  expanded  runs  do  not  improve  over 
the  baseline.  Compared  to  a  baseline  without  the  title  com¬ 
ponent,  we  see  an  improvement  for  the  run  using  expansion 
on  the  combination  of  news,  Wikipedia  and  blog  posts.  For 
this  task,  further  research  into  the  combination  of  title  and 
full  post  components  is  needed,  as  well  as  the  combination 
with  expanded  queries.  The  run  that  tried  to  capture  non¬ 
relevance  of  a  blog  failed,  but  exploring  this  area  further 
could  lead  to  significant  improvements  over  a  baseline  that 
looks  only  at  “relevance.” 

Finally,  looking  at  the  two  tasks  combined,  we  see  that 
query  expansion  on  the  blog  distillation  task  is  much  more 
effective  than  on  the  blog  post  retrieval  task.  Further  analysis 
is  needed  to  find  out  why  this  difference  occurs. 
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